Online Variance Reduction for Stochastic Optimization
نویسندگان
چکیده
Modern stochastic optimization methods often rely on uniform sampling which is agnostic to the underlying characteristics of the data. This might degrade the convergence by yielding estimates that suffer from a high variance. A possible remedy is to employ non-uniform importance sampling techniques, which take the structure of the dataset into account. In this work, we investigate a recently proposed setting which poses variance reduction as an online optimization problem with bandit feedback. We devise a novel and efficient algorithm for this setting that finds a sequence of importance sampling distributions competitive with the best fixed distribution in hindsight, the first result of this kind. While we present our method for sampling datapoints, it naturally extends to selecting coordinates or even blocks of thereof. Empirical validations underline the benefits of our method in several settings.
منابع مشابه
Variance Reduction for Stochastic Gradient Optimization
Stochastic gradient optimization is a class of widely used algorithms for training machine learning models. To optimize an objective, it uses the noisy gradient computed from the random data samples instead of the true gradient computed from the entire dataset. However, when the variance of the noisy gradient is large, the algorithm might spend much time bouncing around, leading to slower conve...
متن کاملRandomized Block Coordinate Descent for Online and Stochastic Optimization
Two types of low cost-per-iteration gradient descent methods have been extensively studied in parallel. One is online or stochastic gradient descent ( OGD/SGD), and the other is randomzied coordinate descent (RBCD). In this paper, we combine the two types of methods together and propose online randomized block coordinate descent (ORBCD). At each iteration, ORBCD only computes the partial gradie...
متن کاملAccelerated Stochastic Power Iteration
Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires O(1/∆) full-data passes to recover the principal component of a matrix with eigen-gap ∆. Lanczos, a significantly more complex method, achieves an accelerated rate of O(1/ √ ∆) passes. Modern applications, however, motivate methods that only ingest...
متن کاملOnline Variance-reducing Optimization
We emphasize the importance of variance reduction in stochastic methods and propose a probabilistic interpretation as a way to store information about past gradients. The resulting algorithm is very similar to the momentum method, with the difference that the weight over past gradients depends on the distance moved in parameter space rather than the number of steps.
متن کاملStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. However, in the context of empirical risk minimization, it is often helpful to augment the training set by considering random perturbations of input examples. In this case, the objective is no longer a finite sum, and the main candidate for optimization is the stochas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.04715 شماره
صفحات -
تاریخ انتشار 2018